Introduction

This selfstudy assignment should be answered using a single Rmd file which can be run on one of the university servers js{01,02,03,04} in a resonable amount of time. Please add student name in the yaml header (if you are a group of students working together you are allowed to only hand in one answer with all your names on it).

To embed Rcpp code in your Rmd file use the chunk option engine='Rcpp' as below:

#include <Rcpp.h>
// [[Rcpp::export]]
int fibonacci(const int x) {
    if (x == 0 || x == 1) return(x);
    return (fibonacci(x - 1)) + fibonacci(x - 2);
}
fibonacci(4)
## [1] 3

To complete the exercises you need both to fill in a bit of LaTeX math formulas and some R and C++ code.

Monte Carlo integration

Problem A: Gaussian tail probability

Consider the tail probability \(P(X>4)\) where \(X\sim N(0,1)\).

  • Express the probability as an integral of an indicator function over the real line and provide a simple Monte Carlo (MC) estimate of the probability along with the MC standard error based on \(100,000\) simulations.

\(P(X>4) = E[1_{X>4}] = \int_{\mathbb{R}} f(x) 1_{x>4} dx\)

set.seed(123)
dat <- rnorm(100000)

indic_mean <- mean(dat>4);indic_mean # Expectation of indicator
## [1] 0.00001
sd(dat) # close to 1
## [1] 0.9997334
pnorm(4, lower.tail = F)
## [1] 0.00003167124
  • Make a change of variable \(u=1/x\) and express the integral with respect to the uniform distribution on \((0,1/4)\) and use this to obtain a more accurate MC estimate (report both MC estimate and MC std. error).

unif_data <- runif(10000, min = 0, max = 0.25)

unif_result <- 0.25 * 1/sqrt(2*pi) * exp(-1/(2*unif_data^2)) * 1/unif_data^2

mean(unif_result)
## [1] 0.00003274824
sd(unif_result)/sqrt(length(unif_result))
## [1] 0.0000009248579

Problem B: Complicated regions of integration

  • Suppose we have a collection of (possibly overlapping) discs with centers in \([0,1]^2\) and we wish to estimate the fraction of the unit square covered by these.

Give a MC estimate and MC std. error for the fraction covered by the following discs:

set.seed(42)
n <- 1000
centers <- data.frame(x=runif(n), y=runif(n))
r <- .03
par(pty = "s")
plot(centers, pch = ".", axes = FALSE, xlab = "", ylab = "")
symbols(centers, circles = rep(r, n), inches = FALSE, bg = gray(.5, .5), add = TRUE)
rect(0,0,1,1, lwd = 3)

afstand_centers <- function(x) {
  vec <- (x-centers)
  output <- apply(vec, 1, norm,type="2")
  return(output)
}


centers_mc <- data.frame(x=runif(100), y=runif(100))

mc_rslt <- apply(centers_mc, 1, afstand_centers)


mean(apply(mc_rslt, 2, min) < r)
## [1] 0.96
sd(apply(mc_rslt, 2, min)< r)/sqrt(100)
## [1] 0.01969464
  • If you implement this with a for loop I suggest trying to do it in C++.

  • Consider whether sorting the centers according to e.g. x-coordinate can be used to speed up calculations, and possibly implement this to see if it works.

Answer:

You could gain speed by not checking the points that are too far away to have the necesarry distance. I.e. if the distance to the x coordinate alone is too much then there is no point calculating the distance.

  • Explain how considering a single circle of radius 0.5 centered in (0.5,0.5) could be used to estimate \(\pi=3.14...\) using MC.

Answer:

Using the same idea as before but with only 1 center and the appropriate r, gives the relation of the area of the circle wrt. the square. Then \(\pi\) is isolated in the equation for the area of a circle, i.e \(\pi =\frac{A}{r^2}\).

Problem C: Combining MC estimates

  • Assume we have \(k\) MC estimates \(m_i = 1/n_i\sum_{j=1}^{n_i} y_i\) with std. errors \(\text{se}_i\) based on sample size \(n_i\), \(i=1,\dots,k\) which are all independent estimates of the same mean value. How can you construct a combined estimate and standard error based on the total sample size \(n_{\bullet} = \sum_{j=1}^k n_i\)?

Answer:

We see two different ideas, a simple mean of the means, or a weighted mean of the means, weighted by \(n_i\). Same for standard error.

  • Use mclapply() (or another parallel processing function) to run 10 independent MC estimates of the integral in problem A and use the formula derived above to combine these to a single MC estimate and MC std. error.
library(parallel)
set.seed(123)

indicator_mean_para <- function(n) {
  data <- rnorm(n)
  return(mean(data>4))
}

vec_para <- rep(10000, 10)
result <- mclapply(vec_para, indicator_mean_para);result
## [[1]]
## [1] 0
## 
## [[2]]
## [1] 0
## 
## [[3]]
## [1] 0.0001
## 
## [[4]]
## [1] 0
## 
## [[5]]
## [1] 0
## 
## [[6]]
## [1] 0
## 
## [[7]]
## [1] 0
## 
## [[8]]
## [1] 0
## 
## [[9]]
## [1] 0
## 
## [[10]]
## [1] 0
mean(unlist(result))
## [1] 0.00001
sd(unlist(result))/sqrt(length(unlist(result)))
## [1] 0.00001
cl <- makeCluster(2)
para_rslt <- parSapply(cl, vec_para, indicator_mean_para)
para_rslt
##  [1] 0.0000 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.0001 0.0000 0.0001
microbenchmark::microbenchmark(mclapply(vec_para, indicator_mean_para),
                               sapply(vec_para, indicator_mean_para),
                               parSapply(cl, vec_para, indicator_mean_para), times = 5)
## Unit: milliseconds
##                                          expr    min     lq    mean median
##       mclapply(vec_para, indicator_mean_para) 6.2735 6.2830 6.39722 6.3388
##         sapply(vec_para, indicator_mean_para) 6.3015 6.3114 6.41092 6.3854
##  parSapply(cl, vec_para, indicator_mean_para) 4.3445 4.3458 4.40922 4.3590
##      uq    max neval
##  6.4532 6.6376     5
##  6.5060 6.5503     5
##  4.4079 4.5889     5
stopCluster(cl)

Rcpp

Problem D: Vector auto-regressive process

Consider a vector auto-regressive process of order 1, VAR(1) in \(\mathbb{R}^d\) with \(d\times d\) lag parameter matrix \(A\): \[x_t = Ax_{t-1} + e^t\] where \(e_t\) is a process of iid multivariate Gaussian variables \(e^t \sim N_d(\mu, \Sigma)\).

Given the starting state \(x_0 = 0\) we wish to simulate a realization of \(\{x_t\}_{t=1}^n\).

  • Verify that the following R function indeed generates an \(d \times n\) matrix containing n multivariate Gaussian vectors with mean vector mu and covariance matrix Sigma (simply check that the empirical covariance matrix is close to the given one for a specific 2x2 case of your choice):
mvrnormR <- function(mu, Sigma, n) {
  d <- length(mu)
  stopifnot(nrow(Sigma) == d)
  std_norm <- matrix(rnorm(n*d), nrow = d, ncol = n)
  mu + t(chol(Sigma)) %*% std_norm
}
mvnorm_data <- mvrnormR(c(1,2), matrix(c(1,0,0,4), nrow = 2), 1000)

var(t(mvnorm_data))
##            [,1]       [,2]
## [1,] 1.10560046 0.07200583
## [2,] 0.07200583 3.90770029
  • Use RcppArmadillo to implement this as a new function mvrnormcpp() and benchmark the two functions for the following input (check that they approximately have the same distribution – you can only use all.equal() to check if you have managed to use the same random numbers in both cases):
d <- 2
Sigma <- matrix(c(2,1,1,4), nrow = 2, ncol = 2)
n <- 1000
set.seed(42)
mu <- c(10, 20)
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp ;
using namespace arma ;


// [[Rcpp::export]]
mat mvrnormRcpp(colvec mu, mat Sigma, int n) {
  int d = mu.size();
  mat std_norm = mat(d,n,fill::randn);
  mat chol_sigma = chol(Sigma);
  mat output_pre_mu = chol_sigma.t() * std_norm;
  mat mu_reppet = mat(d,0);
  for (int i = 0;i<n;i++){
    mu_reppet.insert_cols(i, mu);
  }
  mat output = mu_reppet + output_pre_mu;
  return output;
}


// [[Rcpp::export]]
mat simcpp(mat A, mat Sigma, int n) {
  vec mu = vec(Sigma.n_rows);
  mu.fill(0);
  mat errors = mvrnormRcpp(mu, Sigma, n);
  mat simdata = mat(mu.size(),n,fill::zeros);
  for (int i = 1; i < n; i++){
    simdata.col(i) = A * simdata.col(i-1) + errors.col(i);
  }
  return simdata;
}
test_1 <- mvrnormR(mu, Sigma, n)
test_2 <- mvrnormRcpp(mu, Sigma, n)

var(t(test_1))
##           [,1]      [,2]
## [1,] 1.9770650 0.9226675
## [2,] 0.9226675 3.8900227
var(t(test_2))
##           [,1]      [,2]
## [1,] 2.0305411 0.9583243
## [2,] 0.9583243 3.9712797
microbenchmark::microbenchmark(mvrnormR(mu, Sigma, n), mvrnormRcpp(mu, Sigma, n))
## Unit: microseconds
##                       expr    min      lq     mean  median      uq    max neval
##     mvrnormR(mu, Sigma, n)  140.1  146.50  164.847  155.95  172.45  309.8   100
##  mvrnormRcpp(mu, Sigma, n) 1487.8 1501.25 1556.854 1514.55 1561.30 3090.4   100

Simulation of VAR(1) is obtained with the R function simR() below. Implement this as a C++ function simcpp() which calls mvrnormcpp() you made before and benchmark the C++ and R implementations. We will use A <- matrix(c(0.7,0.2,0.2,0.7),2,2) as the coefficient matrix.

simR <- function(A, Sigma, n) {
  mu <- rep(0, nrow(Sigma))
  errors <- mvrnormR(mu, Sigma, n)
  simdata <- matrix(0, length(mu), n)
  for (col in 2:ncol(errors)) {
    simdata[,col] = A %*% simdata[,(col-1)] + errors[,col]
  }
  return(simdata)
}
set.seed(42)
Sigma <- matrix(c(2,1,1,4), nrow = 2, ncol = 2)
n <- 100
A <- matrix(c(0.7,0.2,0.2,0.7),2,2)


# The results are not equal for the two methods, we have tried with different methods of setting the seed, however we can't figure out how to get them to be equal. We have however tried using matrices filled with 1's and using those we achieved the same paths for both the R and Rcpp implementation.

simR(A, Sigma, n)
##      [,1]      [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
## [1,]    0 0.5135411 1.219350 3.230870 5.511006 6.370726 4.223284 3.931401
## [2,]    0 1.4407481 1.198552 1.974573 3.338289 8.639607 5.818232 6.012901
##            [,9]     [,10]     [,11]        [,12]     [,13]      [,14]
## [1,]  3.5525662 -0.999637 -0.867053 -1.413478499 1.6918786  0.8716697
## [2,] -0.1754595  1.331727 -2.817068  0.005526308 0.2559429 -2.9629573
##           [,15]      [,16]      [,17]     [,18]      [,19]     [,20]     [,21]
## [1,]  0.6682533  0.5575372  1.8209593  1.906412 -0.3307621 -4.387902 -3.640855
## [2,] -2.7717190 -0.1658708 -0.4118704 -2.779289 -3.7108188 -4.303249 -4.419668
##          [,22]     [,23]    [,24]     [,25]    [,26]      [,27]      [,28]
## [1,] -2.360328 -4.516346 -5.08525 -4.468328 -2.90061 -0.3881402 -0.2076790
## [2,] -4.645377 -3.881622 -1.49248 -1.140260 -2.93064 -0.3146082  0.2829957
##          [,29]      [,30]     [,31]     [,32]     [,33]      [,34]    [,35]
## [1,] 0.8718833 -3.4615597 -3.111565 -1.594750 -1.830952 -0.2662401 1.856896
## [2,] 0.8049529 -0.8456234 -1.197388  1.569595  2.702327  3.7057863 4.540495
##          [,36]     [,37]    [,38]    [,39]      [,40]    [,41]    [,42]
## [1,] 0.7327333 1.9233824 0.709481 1.892115  0.5987219 2.466156 2.121197
## [2,] 2.6434062 0.6539461 1.545546 2.634592 -0.4612025 1.349075 1.273943
##           [,43]      [,44]    [,45]    [,46]    [,47]    [,48]      [,49]
## [1,] 0.05059063 0.05161596 1.485316 3.540473 3.848584 2.192579  0.3576843
## [2,] 1.61642316 0.64616679 2.660012 2.252607 5.347315 2.116995 -1.6097870
##           [,50]    [,51]     [,52]     [,53]     [,54]     [,55]     [,56]
## [1,] 0.04153405 1.772150 0.4154262 0.3840584 0.3588356 0.9082054 0.9594242
## [2,] 0.22327560 2.968361 5.1811000 3.4357735 1.9543769 1.7958333 1.6231668
##           [,57]     [,58]     [,59]      [,60]     [,61]     [,62]     [,63]
## [1,] 0.30972122 -2.124035 -2.571588 -3.1247771 -4.122628 -3.471977 -3.486885
## [2,] 0.04154319 -1.798831  3.008288  0.8851053 -3.812467 -5.269616 -5.185614
##          [,64]     [,65]     [,66]     [,67]     [,68]     [,69]     [,70]
## [1,] -4.345805 -6.483931 -5.212753 -5.046056 -2.262383 -2.694786 -2.869817
## [2,] -8.549064 -7.383689 -6.986089 -3.832034 -4.726015 -1.595847 -2.086360
##          [,71]     [,72]     [,73]     [,74]     [,75]     [,76]     [,77]
## [1,] -2.547918 -3.163621 -3.501495 -3.390920 -1.940044 -2.234268 -1.150330
## [2,] -3.755998 -3.508308 -1.298235 -2.759562 -4.093405 -6.184845 -4.462876
##          [,78]     [,79]     [,80]     [,81]     [,82]      [,83]      [,84]
## [1,] -2.359439 -2.862875 -3.993602 -3.616492 -3.287245 -2.4145259 -3.2696352
## [2,] -6.001454 -6.175585 -2.863701 -4.932548 -4.739299 -0.8779393 -0.9488295
##           [,85]     [,86]     [,87]      [,88]      [,89]      [,90]      [,91]
## [1,] -2.3584466 -1.892393 -3.547924 -0.7796689 -2.3343800 -0.7638265 -0.2170768
## [2,]  0.4173741  1.223166  2.404485  1.3029690  0.9345009  0.5170540 -0.7764998
##          [,92]      [,93]      [,94]      [,95]     [,96]     [,97]     [,98]
## [1,] 0.2143178 -0.1998963  0.3516961 -0.9876319 -1.138103 -1.094697 -3.075420
## [2,] 0.2250683 -2.4969247 -0.2555154 -3.6830809 -3.276356 -4.763303 -2.200189
##           [,99]    [,100]
## [1,] -2.0218079 0.9973579
## [2,] -0.7724865 0.5794610
simcpp(A, Sigma, n)
##      [,1]       [,2]       [,3]       [,4]         [,5]       [,6]       [,7]
## [1,]    0  2.6077709  0.6071929 -1.1904342 -2.332671941 -1.8178693 -2.4595390
## [2,]    0 -0.8396652 -0.9726912 -0.8612278 -0.005374285 -0.8008187  0.1301978
##           [,8]      [,9]     [,10]     [,11]      [,12]      [,13]     [,14]
## [1,] -1.853186 -3.083110 -2.525272 -3.116183 -0.8618157 -0.0627064 0.4307718
## [2,] -1.726499 -2.384791 -3.679333 -7.258178 -4.0451162  3.1160260 0.5382245
##           [,15]     [,16]     [,17]      [,18]     [,19]     [,20]     [,21]
## [1,] -0.2248785 -1.990856 -1.586422 -1.8125637 -3.793190 -3.542885 -2.567572
## [2,] -2.5198800 -2.548637 -0.847225  0.6445216 -1.843362 -1.717154 -2.676938
##          [,22]     [,23]     [,24]     [,25]     [,26]     [,27]     [,28]
## [1,] -4.190357 -2.517130 -3.464956 -5.318140 -4.846469 -5.692087 -5.775178
## [2,] -6.985798 -2.881131 -6.189841 -4.310736 -5.572288 -4.892774 -6.833105
##          [,29]      [,30]     [,31]     [,32]     [,33]     [,34]     [,35]
## [1,] -5.831660  -6.034522 -6.052726 -6.414479 -7.674746 -5.116875 -3.936200
## [2,] -4.243981 -10.532306 -6.893222 -5.511279 -5.387241 -4.861653 -4.639084
##          [,36]     [,37]     [,38]      [,39]     [,40]      [,41]     [,42]
## [1,] -3.113123 -1.327967 -1.589138 -0.6059357 -1.675905 -1.8223759 -1.020934
## [2,] -3.774464 -2.976614 -4.906706 -4.4884413 -3.856152 -0.4067211 -2.566372
##           [,43]     [,44]     [,45]     [,46]     [,47]     [,48]     [,49]
## [1,] -0.6544794 -1.690895 -2.896684 -2.364319 -3.982514 -6.701732 -4.173244
## [2,] -2.1790918 -1.244442 -3.338028 -6.650214 -6.518828 -4.849553 -1.484200
##          [,50]     [,51]     [,52]     [,53]     [,54]     [,55]     [,56]
## [1,] -3.444134 -2.708290 -1.193070 -1.449893 -1.879100 -1.677753 -2.060191
## [2,] -5.228902 -4.727018 -1.930378 -3.323071 -2.699356 -4.947111 -6.190022
##          [,57]      [,58]     [,59]     [,60]     [,61]     [,62]     [,63]
## [1,] -2.154310 -0.8536856 -2.823040 -2.526831 -2.327402 -2.013383 -1.143463
## [2,] -1.598533 -3.1859946 -5.650762 -5.765853 -4.665237 -2.943199 -3.163323
##          [,64]     [,65]     [,66]     [,67]     [,68]     [,69]     [,70]
## [1,] -3.287741 -2.122614 -2.388447 -1.103937 -3.082572 0.1650339 0.2385417
## [2,] -5.092242 -5.279899 -1.025130 -3.033470 -2.166680 1.6298850 1.5156479
##         [,71]     [,72]    [,73]     [,74]    [,75]      [,76]     [,77]
## [1,] 2.398778 0.4573084 0.193190 0.3567415 3.160583 3.76160152 1.7352085
## [2,] 4.227851 2.2238328 4.663105 4.6584806 4.204081 0.08719731 0.2441115
##          [,78]     [,79]     [,80]     [,81]     [,82]      [,83]     [,84]
## [1,]  1.241556  1.725473  3.509477  2.104847 -3.049875 -5.7207423 -1.928403
## [2,] -2.909087 -2.418882 -5.179856 -3.713294 -2.058053  0.8456378 -1.617221
##          [,85]     [,86]      [,87]      [,88]      [,89]     [,90]     [,91]
## [1,] -1.978317 -1.997106 -3.6897672 -3.8721293 -3.9029739 -5.124172 -1.718046
## [2,] -1.970988 -5.689688 -0.3135198 -0.1950831 -0.6252521 -3.286679 -2.447850
##          [,92]     [,93]     [,94]     [,95]      [,96]     [,97]      [,98]
## [1,] -2.740944 -3.959008 -3.177886 -3.860210 -2.0620192 -2.104946 -1.6046242
## [2,] -4.811102 -5.751575 -1.223788 -2.098131 -0.8564968 -2.596450  0.2239889
##           [,99]     [,100]
## [1,] -0.7923259  0.1611405
## [2,]  3.5891774 -1.8003140
microbenchmark::microbenchmark(simR(A, Sigma, n), simcpp(A, Sigma, n))
## Unit: microseconds
##                 expr   min    lq    mean median     uq    max neval
##    simR(A, Sigma, n) 154.7 162.4 178.534 167.75 176.70  635.4   100
##  simcpp(A, Sigma, n)  40.1  42.9  59.278  45.00  49.55 1104.8   100